8 research outputs found
Three Dimensional Pseudo-Spectral Compressible Magnetohydrodynamic GPU Code for Astrophysical Plasma Simulation
This paper presents the benchmarking and scaling studies of a GPU accelerated
three dimensional compressible magnetohydrodynamic code. The code is developed
keeping an eye to explain the large and intermediate scale magnetic field
generation is cosmos as well as in nuclear fusion reactors in the light of the
theory given by Eugene Newman Parker. The spatial derivatives of the code are
pseudo-spectral method based and the time solvers are explicit. GPU
acceleration is achieved with minimal code changes through OpenACC
parallelization and use of NVIDIA CUDA Fast Fourier Transform library (cuFFT).
NVIDIAs unified memory is leveraged to enable over-subscription of the GPU
device memory for seamless out-of-core processing of large grids. Our
experimental results indicate that the GPU accelerated code is able to achieve
upto two orders of magnitude speedup over a corresponding OpenMP parallel, FFTW
library based code, on a NVIDIA Tesla P100 GPU. For large grids that require
out-of-core processing on the GPU, we see a 7x speedup over the OpenMP, FFTW
based code, on the Tesla P100 GPU. We also present performance analysis of the
GPU accelerated code on different GPU architectures - Kepler, Pascal and Volta
On the performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia
This report presents a comprehensive analysis of the performance of GPU
accelerated meshfree CFD solvers for two-dimensional compressible flows in
Fortran, C++, Python, and Julia. The programming model CUDA is used to develop
the GPU codes. The meshfree solver is based on the least squares kinetic upwind
method with entropy variables (q-LSKUM). To assess the computational efficiency
of the GPU solvers and to compare their relative performance, benchmark
calculations are performed on seven levels of point distribution. To analyse
the difference in their run-times, the computationally intensive kernel is
profiled. Various performance metrics are investigated from the profiled data
to determine the cause of observed variation in run-times. To address some of
the performance related issues, various optimisation strategies are employed.
The optimised GPU codes are compared with the naive codes, and conclusions are
drawn from their performance.Comment: 42 pages, 3 figure
Learn CUDA programming: a beginner's guide to GPU programming and parallel computing with CUDA 10.x and C/C++
This book is for programmers who want to delve into parallel computing, become part of the high-performance computing community and apply those techniques to build modern applications. Experience with C++ programming is assumed. There are some sample examples on equivalent Fortran code. For Deep Learning enthusiasts python based sample code is ..